Designing secondary structure profiles for fast ncRNA identification.

نویسندگان

  • Yanni Sun
  • Jeremy Buhler
چکیده

Detecting non-coding RNAs (ncRNAs) in genomic DNA is an important part of annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high computational cost when used for search. This cost can be reduced by using a filter to exclude sequence that is unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect nearly all ncRNA instances while excluding most irrelevant sequences remains challenging. This work proposes a systematic procedure to convert a CM for an ncRNA family to a secondary structure profile (SSP), which augments a conservation profile with secondary structure information but can still be efficiently scanned against long sequences. We use dynamic programming to estimate an SSP's sensitivity and FP rate, yielding an efficient, fully automated filter design algorithm. Our experiments demonstrate that designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families, including those with and without strong sequence conservation. For highly structured ncRNA families, including secondary structure conservation yields better performance than using primary sequence conservation alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammar string: a novel ncRNA secondary structure representation

Multiple ncRNA alignment has important applications in homologous ncRNA consensus structure derivation, novel ncRNA identification, and known ncRNA classification. As many ncRNAs’ functions are determined by both their sequences and secondary structures, accurate ncRNA alignment algorithms must maximize both sequence and structural similarity simultaneously, incurring high computational cost. F...

متن کامل

Visualizing RNA Secondary Structure Base Pair Binding Probabilities using Nested Concave Hulls

The challenge 1 of the BIOVIS 2015 design contest consists in designing an intuitive visual depiction of base pairs binding probabilities for secondary structure of ncRNA. Our representation depicts the potential nucleotide pairs binding using nested concave hulls over the computed MFE ncRNA secondary structure. Thus, it allows to identify regions with a high level of uncertainty in the MFE com...

متن کامل

ncRNA discovery and functional identification via sequence motifs

Non-coding RNAs play regulatory roles in gene expression via establishing stable joint structures with target mRNAs through complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Here we introduce two computational tools that both exploit differential distributions of short sequence motifs in ncRNAs for the purpose of identifying their loci an...

متن کامل

In Silico Identification and Characterization of mRNA-Like Noncoding Transcripts in Medicago truncatula

Accumulating evidence suggests that that non-coding RNAs (ncRNAs) play key roles in gene regulation and may form the basis of an inter-gene communication system. Many ncRNAs are synthesized similar to mRNAs and can be detected through screening of polyA-rich EST or cDNA libraries. We developed a computational pipeline to screen EST and genomic sequence data for those transcribed genes with limi...

متن کامل

An Ariadne's thread to the identification and annotation of noncoding RNAs in eukaryotes

Non-protein coding RNAs (ncRNAs) have emerged as a vast and heterogeneous portion of eukaryotic transcriptomes. Several ncRNA families, either short (<200 nucleotides, nt) or long (>200 nt), have been described and implicated in a variety of biological processes, from translation to gene expression regulation and nuclear trafficking. Most probably, other families are still to be discovered. Com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational systems bioinformatics. Computational Systems Bioinformatics Conference

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2008